Classify ClickHouse NO_COMMON_TYPE (386) as unsafe#1380
Conversation
Admins running SQL via /internal/analytics/query were getting a generic "unknown Clickhouse error" with a Sentry capture when their query hit a type mismatch (e.g. `SELECT [1, 'a']`). 386 is triggered by user-authored SQL, so surface the real message and add a regression test confirming no row data leaks when the same error fires against internal cross-project tables.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThe PR classifies ClickHouse error code 386 ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts (1)
235-258: LGTM — solid leak-prevention regression test.The approach is sound: the query deliberately forces a type mismatch on a real column of the cross-project
analytics_internal.userstable, so NO_COMMON_TYPE fires during type analysis before any row is read. The combination of an explicit 400 +@absence +primary_emailassignment-pattern check + full inline snapshot provides reasonable defense-in-depth against future regressions where ClickHouse error formatting might evolve.Minor optional observation (non-blocking): the regex
/primary_email\s*[:=]\s*['"]/only flags quoted values. An unquoted value leak (e.g.primary_email = user@xin a hypothetical future message shape) would be caught by the@check but not by this regex. Tightening to e.g./primary_email\s*[:=]\s*\S/would make intent explicit, though it's not strictly necessary given the@check and the snapshot.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts` around lines 235 - 258, Update the negative regex test to also catch unquoted leaked values: replace the current /primary_email\s*[:=]\s*['"]/ check in the test "does not leak data from the internal cross-project users table via type-mismatch errors" with a pattern that matches any non-whitespace value (e.g. /primary_email\s*[:=]\s*\S/), so change the expect(...).not.toMatch call referencing that regex to use the new pattern; everything else in the test (status check, "@" check, snapshot) should remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts`:
- Around line 235-258: Update the negative regex test to also catch unquoted
leaked values: replace the current /primary_email\s*[:=]\s*['"]/ check in the
test "does not leak data from the internal cross-project users table via
type-mismatch errors" with a pattern that matches any non-whitespace value (e.g.
/primary_email\s*[:=]\s*\S/), so change the expect(...).not.toMatch call
referencing that regex to use the new pattern; everything else in the test
(status check, "@" check, snapshot) should remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 85447bd0-fd38-4fa2-8a7e-f2754ea848ae
📒 Files selected for processing (2)
apps/backend/src/lib/clickhouse-errors.tsapps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts
Greptile SummaryThis PR adds ClickHouse error code Confidence Score: 4/5Safe to merge; the classification change is well-reasoned and the only finding is a test coverage gap Only P2 findings are present. The core logic change (adding 386 to the safe list) is correct — the error is analysis-time only and cannot contain row values. The single concern is that the regression test covers an accessible table but not the scenario where 386 might fire before a 497 access-denied on restricted tables, which is the documented reason code 43 is classified as unsafe. apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts — consider adding a test targeting a restricted table to confirm the error ordering assumption Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Admin POST /internal/analytics/query] --> B[Execute SQL via ClickHouse readonly client]
B --> C{ClickHouse Result}
C -->|Success| D[Return rows to caller]
C -->|Error| E[getSafeClickhouseErrorMessage]
E --> F{Error code in SAFE list?}
F -->|Yes: 62,158,159,164,386,396,636| G[Return full error message to caller]
F -->|No: check UNSAFE list| H{Error code in UNSAFE list?}
H -->|Yes: 36,43,47,60,497| I[Return generic message in production]
H -->|No: unknown code| J[captureError to Sentry + return generic message]
G --> K[400 ANALYTICS_QUERY_ERROR with details]
I --> K
J --> K
Prompt To Fix All With AIThis is a comment left during a code review.
Path: apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts
Line: 235-258
Comment:
**Test gap: 386 may fire before permission checks on restricted tables**
The new test queries `analytics_internal.users`, which `limited_user` appears to have access to (no `497` is returned). However, the existing test suite explicitly documents that error code 43 (`ILLEGAL_TYPE_OF_ARGUMENT`) is classified as **unsafe** precisely because ClickHouse resolves types before checking permissions — allowing column type/name info to leak from restricted tables (see line 1653: querying `system.query_log` with a code-43-triggering query still returns column type info).
If 386 behaves the same way (type-checking happens before ACL enforcement), a query like `SELECT if(1, query, 1) FROM system.query_log` would return a 386 error revealing that `query` is of type `String` — schema info from a restricted table. The current test doesn't cover this case because it only exercises a table the caller can already access.
Consider adding a test that verifies what happens when a 386-triggering query targets a table `limited_user` definitely cannot access (e.g. `system.query_log`), to confirm that either: (a) a `497` fires instead of `386`, or (b) if 386 does fire, the revealed type info is acceptable given the schema is public for system tables.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Classify ClickHouse NO_COMMON_TYPE (386)..." | Re-trigger Greptile |
Type resolution happens before ACL checks, so a 386 error against a restricted table (e.g. system.query_log) leaks column types. Move 386 to the unsafe list so the raw message is suppressed in prod, and add a parallel test pinning the behavior against system.query_log.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts`:
- Around line 240-243: The two no-op guards using response and errorText
(expect(errorText).not.toContain("@") and
expect(errorText).not.toMatch(/primary_email\s*[:=]\s*['"]/)) should be removed
or replaced: either drop them and rely on the existing inline snapshot assertion
as the source of truth, or replace them with a structural assertion that the
response body contains only the echoed query and column/type list (no row
values) — locate the test where response and errorText are created in
analytics-query.test.ts and update the expectations accordingly to assert
equality/structure against the inline snapshot or to explicitly verify that no
content appears outside the echoed SELECT and the type information.
- Around line 235-268: The test only covers dev-mode behavior and doesn't assert
that error code 386 is treated as UNSAFE in production; add a test that invokes
getSafeClickhouseErrorMessage (or the public wrapper that uses it) with a
simulated 386 ClickHouse error and a sample query under production-like
conditions (e.g., set NODE_ENV=production or toggle the env flag used by the
sanitizer) and assert it returns DEFAULT_CLICKHOUSE_ERROR_MESSAGE instead of the
raw error text; ensure the new test references getSafeClickhouseErrorMessage (or
the API endpoint that calls it) and the DEFAULT_CLICKHOUSE_ERROR_MESSAGE
constant so the behavior change is validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eac1baba-a68b-4b3b-9880-14a9cfc5565b
📒 Files selected for processing (2)
apps/backend/src/lib/clickhouse-errors.tsapps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/backend/src/lib/clickhouse-errors.ts
Summary
386(NO_COMMON_TYPE) toUNSAFE_CLICKHOUSE_ERROR_CODESinapps/backend/src/lib/clickhouse-errors.ts. This stops the SentryStackAssertionError(Unknown Clickhouse error: code 386 not in safe or unsafe codes) that was firing whenever an admin wrote a query likeSELECT [1, 'a']orSELECT if(1, 'a', 1), while keeping the raw error message out of prod responses.analytics_internal.userstable, and one againstsystem.query_log, to pin that 386 is wrapped with the genericError during execution of this query.message in prod (full detail only surfaces in dev/test).Why unsafe, not safe
Both callers of
getSafeClickhouseErrorMessage(apps/backend/src/app/api/latest/internal/analytics/query/route.ts:59andapps/backend/src/lib/ai/tools/sql-query.ts:80) execute caller-authored SQL underreadonly: "1"withSQL_project_id/SQL_branch_idscoping. The ClickHouse client runs under alimited_userwhose grants restrict most tables — but ClickHouse resolves types before enforcing ACL. That means a query likeSELECT if(1, query, 1) FROM system.query_logsurfaces code 386 with a message likeThere is no supertype for types String, UInt8 ..., leaking thatsystem.query_log.queryis aString— schema info from a table the caller can't actually read.This is the same type-before-ACL class as code 43 (
ILLEGAL_TYPE_OF_ARGUMENT), which is already classified unsafe. Classifying 386 as unsafe keeps the defense-in-depth consistent: if per-customer tables are ever introduced and grants don't block reference-resolution in time, 386 won't leak their schema.Cost: in prod, an admin writing a malformed type-mismatch query sees only
Error during execution of this query.instead of the supertype hint. Dev and test environments still show the full error via the existinggetNodeEnvironment()branch, so local iteration is unaffected.Test plan
pnpm test run apps/e2e/tests/backend/endpoints/api/v1/analytics-query.test.ts— all 64 tests pass, including the two 386 regression tests.unknown-clickhouse-error-for-queryevents for code 386 stop firing.Summary by CodeRabbit
Bug Fixes
Tests